Project Description¶

Studying abroad is an exciting experience, but it can also bring challenges. To understand these challenges better, a Japanese international university conducted a study on the mental health of its students.

In this project, we will use data manipulation skills to explore this study’s data. Our goal is to find out which factors have the strongest impact on students’ mental health when they study in a foreign country.

The university surveyed its students in 2018 and published the study in 2019. This study was carefully approved by ethical and regulatory boards to ensure proper research standards.

The results showed that international students face a higher risk of mental health issues compared to domestic students. The study also found that social connectedness (how much a student feels part of a social group) and acculturative stress (stress caused by adjusting to a new culture) are important predictors of depression.

Data Description¶

Field Name Description
inter_dom Type of student: international or domestic
japanese_cate Level of Japanese language proficiency
english_cate Level of English language proficiency
academic Current academic level: undergraduate or graduate
age Age of the student
stay Length of stay in Japan (in years)
todep Total depression score based on PHQ-9 test
tosc Total social connectedness score from SCS test
toas Total acculturative stress score from ASISS test

This project will help us understand how living and studying abroad affects mental health, and which factors we should pay attention to for better student support.

In [7]:
# import the required library
import pandas as pd
In [6]:
# load the dataset
df = pd.read_csv('students.csv', usecols=lambda column: column != "index")
df.head()
Out[6]:
inter_dom region gender academic age age_cate stay stay_cate japanese japanese_cate ... friends_bi parents_bi relative_bi professional_bi phone_bi doctor_bi religion_bi alone_bi others_bi internet_bi
0 Inter SEA Male Grad 24.0 4.0 5.0 Long 3.0 Average ... Yes Yes No No No No No No No No
1 Inter SEA Male Grad 28.0 5.0 1.0 Short 4.0 High ... Yes Yes No No No No No No No No
2 Inter SEA Male Grad 25.0 4.0 6.0 Long 4.0 High ... No No No No No No No No No No
3 Inter EA Female Grad 29.0 5.0 1.0 Short 2.0 Low ... Yes Yes Yes Yes No No No No No No
4 Inter EA Female Grad 28.0 5.0 1.0 Short 1.0 Low ... Yes Yes No Yes No Yes Yes No No No

5 rows × 50 columns

In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 286 entries, 0 to 285
Data columns (total 50 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   inter_dom        268 non-null    object 
 1   region           268 non-null    object 
 2   gender           268 non-null    object 
 3   academic         268 non-null    object 
 4   age              268 non-null    float64
 5   age_cate         268 non-null    float64
 6   stay             268 non-null    float64
 7   stay_cate        268 non-null    object 
 8   japanese         268 non-null    float64
 9   japanese_cate    268 non-null    object 
 10  english          268 non-null    float64
 11  english_cate     268 non-null    object 
 12  intimate         260 non-null    object 
 13  religion         268 non-null    object 
 14  suicide          268 non-null    object 
 15  dep              270 non-null    object 
 16  deptype          271 non-null    object 
 17  todep            268 non-null    float64
 18  depsev           273 non-null    object 
 19  tosc             268 non-null    float64
 20  apd              268 non-null    float64
 21  ahome            268 non-null    float64
 22  aph              268 non-null    float64
 23  afear            268 non-null    float64
 24  acs              268 non-null    float64
 25  aguilt           268 non-null    float64
 26  amiscell         268 non-null    float64
 27  toas             268 non-null    float64
 28  partner          268 non-null    float64
 29  friends          268 non-null    float64
 30  parents          268 non-null    float64
 31  relative         268 non-null    float64
 32  profess          268 non-null    float64
 33  phone            268 non-null    float64
 34  doctor           268 non-null    float64
 35  reli             268 non-null    float64
 36  alone            268 non-null    float64
 37  others           268 non-null    float64
 38  internet         242 non-null    float64
 39  partner_bi       283 non-null    object 
 40  friends_bi       283 non-null    object 
 41  parents_bi       272 non-null    object 
 42  relative_bi      272 non-null    object 
 43  professional_bi  272 non-null    object 
 44  phone_bi         272 non-null    object 
 45  doctor_bi        272 non-null    object 
 46  religion_bi      272 non-null    object 
 47  alone_bi         272 non-null    object 
 48  others_bi        272 non-null    object 
 49  internet_bi      272 non-null    object 
dtypes: float64(26), object(24)
memory usage: 111.8+ KB

This dataset can unveil a lot of details, but we are not going to dig much deeper, we will answer a few questions¶

Let's see how the length of stay impacts the average mental health diagnostic scores of the international students present in the study.¶

In [37]:
# filtering the data to include the most relevant fields
stay_analysis = df[['stay', 'todep', 'tosc', 'toas']]
stay_analysis
Out[37]:
stay todep tosc toas
0 5.0 0.0 34.0 91.0
1 1.0 2.0 48.0 39.0
2 6.0 2.0 41.0 51.0
3 1.0 3.0 37.0 75.0
4 1.0 3.0 37.0 82.0
... ... ... ... ...
281 NaN NaN NaN NaN
282 NaN NaN NaN NaN
283 NaN NaN NaN NaN
284 NaN NaN NaN NaN
285 NaN NaN NaN NaN

286 rows × 4 columns

In [38]:
# checking the null values in filtered data
stay_analysis.isna().sum()
Out[38]:
stay     18
todep    18
tosc     18
toas     18
dtype: int64
In [39]:
# since the data was not preprocessed, we will drop the observations where the stay in 'Nan', as we are working only with 'stay'
stay_analysis = stay_analysis.dropna(subset= ['stay'])

stay_analysis = stay_analysis.astype('int')
stay_analysis
Out[39]:
stay todep tosc toas
0 5 0 34 91
1 1 2 48 39
2 6 2 41 51
3 1 3 37 75
4 1 3 37 82
... ... ... ... ...
268 4 8 27 74
269 3 2 48 50
270 1 9 47 43
271 1 1 43 44
272 2 7 41 61

268 rows × 4 columns

In [40]:
stay_analysis = stay_analysis.groupby('stay').agg(
    no_of_students=('stay', 'count'),
    mean_depressrion=('todep', 'mean'),
    mean_social_connectedness=('tosc', 'mean'),
    mean_acculturative_stress=('toas', 'mean')
).round(2).sort_index(ascending=False)

stay_analysis
Out[40]:
no_of_students mean_depressrion mean_social_connectedness mean_acculturative_stress
stay
10 1 13.00 32.00 50.00
8 1 10.00 44.00 65.00
7 1 4.00 48.00 45.00
6 3 6.00 38.00 58.67
5 3 7.67 34.00 89.00
4 23 7.96 35.00 78.74
3 69 8.87 37.78 71.35
2 52 8.58 37.08 74.87
1 115 7.70 37.94 71.03
In [42]:
# visualize the findings to understand better
import matplotlib.pyplot as plt

stay_analysis[['mean_depressrion', 'mean_social_connectedness', 'mean_acculturative_stress']].plot(
    kind='bar', figsize=(12,6), width=0.8
)

# Labels & Title
plt.xlabel('Stay')
plt.ylabel('Mean Values')
plt.title('Comparison of Depression, Social Connectedness & Acculturative Stress Across Stay')
plt.legend(title='Metrics')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
No description has been provided for this image

By analyzing further, we can answer many questions. We can study mental health by gender, age, region and much more, we can also visualize the data¶